Proportional number in each category level combination group
3 old females, 3 old males, 3 young females, 3 young males
6 old females, 6 old males, 2 young females, 2 young males
Unbalanced data
Unequal numbers in some category combinations
3 old females, 3 old males, 3 young females, 2 young males
Extreme case: empty category combinations
3 old females, 3 old males, 3 young females, 0 young males
Historical perspective on unbalanced data, 1
Historical perspective on unbalanced data, 2
Historical perspective on unbalanced data, 3
Historical perspective on unbalanced data, 4
A simple illustration of the complexities of unbalanced data, 1
A simple illustration of the complexities of unbalanced data, 2
A simple illustration of the complexities of unbalanced data, 3
Mathematical model, 1
\(Y_{ijk}\)
i = which level of first category
j = which level of second category
k = which patient within a category combination
Mathematical model, 2
\(\begin{smallmatrix} Age & Gender & Outcome \\ Old & Female & Y_{111} \\ Old & Female & Y_{112} \\ Old & Female & Y_{113} \\ Old & Male & Y_{121} \\ Old & Male & Y_{122} \\ Old & Male & Y_{123} \\ Young & Female & Y_{211} \\ Young & Female & Y_{212} \\ Young & Female & Y_{213} \\ Young & Male & Y_{221} \\ Young & Male & Y_{222} \\ Young & Male & Y_{223}\end{smallmatrix}\)
# A tibble: 12 × 5
id age gender code db
<int> <chr> <chr> <chr> <dbl>
1 1 old female of 45
2 2 old female of 60
3 3 old female of 60
4 4 old male om 65
5 5 old male om 60
6 6 old male om 70
7 7 young female yf 20
8 8 young female yf 20
9 9 young female yf 5
10 10 young male ym 25
11 11 young male ym 20
12 12 young male ym 30
Artificial data with means
# A tibble: 12 × 8
id age gender code db age_mean gender_mean overall_mean
<int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 old female of 45 60 35 40
2 2 old female of 60 60 35 40
3 3 old female of 60 60 35 40
4 4 old male om 65 60 45 40
5 5 old male om 60 60 45 40
6 6 old male om 70 60 45 40
7 7 young female yf 20 20 35 40
8 8 young female yf 20 20 35 40
9 9 young female yf 5 20 35 40
10 10 young male ym 25 20 45 40
11 11 young male ym 20 20 45 40
12 12 young male ym 30 20 45 40
SS(Total)
SS(Age)
SS(Gender)
Analysis of variance table
Analysis of Variance Table
Response: db
Df Sum Sq Mean Sq F value Pr(>F)
age 1 4800 4800.0 108.00 2.595e-06 ***
gender 1 300 300.0 6.75 0.02883 *
Residuals 9 400 44.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Break #1
What you have learned
Two factor analysis of variance
What’s coming next
Relationship to linear regression
Create indicator variables
# A tibble: 12 × 6
age gender code i_young i_male db
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 old female of 0 0 45
2 old female of 0 0 60
3 old female of 0 0 60
4 old male om 0 1 65
5 old male om 0 1 60
6 old male om 0 1 70
7 young female yf 1 0 20
8 young female yf 1 0 20
9 young female yf 1 0 5
10 young male ym 1 1 25
11 young male ym 1 1 20
12 young male ym 1 1 30
Two factor analysis of variance using aov
m1 <-aov(db ~ age + gender, data=hearing)anova(m1)
Analysis of Variance Table
Response: db
Df Sum Sq Mean Sq F value Pr(>F)
age 1 4800 4800.0 108.00 2.595e-06 ***
gender 1 300 300.0 6.75 0.02883 *
Residuals 9 400 44.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Two factor analysis of variance using linear regression, 1
m2 <-lm(db ~ age + gender, data=hearing)anova(m2)
Analysis of Variance Table
Response: db
Df Sum Sq Mean Sq F value Pr(>F)
age 1 4800 4800.0 108.00 2.595e-06 ***
gender 1 300 300.0 6.75 0.02883 *
Residuals 9 400 44.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Two factor analysis of variance using linear regression, 2
Impact of one variable is influenced by a second variable
Example, influence of alcohol on sleeping pills
Three types of interactions
Between two categorical predictors
Between a categorical and a continuous predictor
Between two continuous predictors
Interactions greatly complicate interpretation
Interaction plot
X axis, first categorical variable
Separate lines for second categorical variable
Y axis, average outcome
Hypothetical interaction plots, 1
No interaction
Ineffective treatment
Boys/girls similar
No interaction
Ineffective treatment
Boys fare better than girls
Hypothetical interaction plots, 2
No interaction
Effective treatment
Boys/girls similar
No interaction
Effective treatment
Boys fare better than girls
Hypothetical interaction plots, 3
Significant interaction
Harmful treatment in boys
Effective treatment in girls
Significant interaction
Ineffective treatment in boys
Effective treatment in girls
Hypothetical interaction plots, 4
Significant interaction
Girls fare better overall
Effective treatment
Much more effective in boys
Indicator variable for an interaction
# A tibble: 12 × 7
age gender code i_young i_male i_m_by_y db
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 old female of 0 0 0 45
2 old female of 0 0 0 60
3 old female of 0 0 0 60
4 old male om 0 1 0 65
5 old male om 0 1 0 60
6 old male om 0 1 0 70
7 young female yf 1 0 0 20
8 young female yf 1 0 0 20
9 young female yf 1 0 0 5
10 young male ym 1 1 1 25
11 young male ym 1 1 1 20
12 young male ym 1 1 1 30
Interpretation of intercept and slopes
When you can’t estimate an interaction
Special case, n=1
Only one observation for categorical combination
Example, full moon study, 1 of 2
# A tibble: 36 × 3
month1 moon1 n
<fct> <fct> <int>
1 Aug Before 1
2 Aug During 1
3 Aug After 1
4 Sep Before 1
5 Sep During 1
6 Sep After 1
7 Oct Before 1
8 Oct During 1
9 Oct After 1
10 Nov Before 1
# ℹ 26 more rows
Analysis of Variance Table
Response: admission
Df Sum Sq Mean Sq F value Pr(>F)
month 11 455.58 41.417 NaN NaN
moon 2 41.51 20.757 NaN NaN
month:moon 22 127.82 5.810 NaN NaN
Residuals 0 0.00 NaN